The MSSP NCNM Team

The MSSP NCNM Presentation
  • Professor: Haviland Wright

  • Group 1: Jimmy Ye, Jinyu Li, Yuli Jin

  • Group 2: Daniel Xu, Kayla Choi, Nancy Shen

  • Group 3: Mi Zhang, Boyu Chen, Shicong Wang, Biyao Zhang

  • Group 4: Keliang Xu, Yingjie Wang, James He, Ruining Jia

Our Partners

  • Alison Turner: A Community Development Planner at NCNMEDD and recent MSSP graduate

  • Aidan O’Hara: Working with Alison since late July

  • Allen Razdow: Founder and president of True Engineering Technology, LLC and originator of Truenumbers

Project Background

  • The current developing situation in NCNM:
Historically, few resources to acquire grants
Trouble successfully administering grants to complete projects
Currently, at a turning point:
  • New pandemic-related dollars flowing to the region; have capital to spend on new projects
  • Two big issues of broadband access and issues of outmigration

    • What approaches are used for collecting data:
    Census; they don’t collect a lot of data from their office
    They would like recommendations on the gaps in census data or the insufficiencies that they’re seeing by the census as a region.

    • What variables will we use for this project? On what scales are they measured:
    Demographics(categorical).
    Income(numerical), range: 0-1,000,000,000,000 (unsure if this is the maximum) gross receipts tased.
    Unemployment rate(numerical).
    GDP(numerical).
    Number of business establishments(numerical).

    Project focus

    The ED-900 form must accompany all EDA grant applications. Here’s an example:
    Ultimate Goal:
    • TrueNumbers database that can be accessed by NCNMEDD and local government staff to assist with grant applications.

    • An analysis of the data from the region - we have fairly low census response rates which could lead to data quality issues

    • If data quality issues exist, come up with supplemental sources of data to improve inferences made about the region.

    Focusing on for this semester:
    • TrueNumbers

    • Dive into what the census is, why it’s important, and how low response rates may pose an issue.

    Our approach

    • Streamline the data acquisition, organization, and analysis process.

    • Using Tnum package, created function to extract county-level census data.

    • Visualization using ggplot to check the relationship between variables.

    • Create some models to have an in-depth insight of the grant situation of New Mexico

    Truenumbers

    Truenumbers continue..

    Truenumbers continue…

    Data

    Data Source - ACS

    The ACS (American Community Survey) is a large demographic survey collected throughout the year using mailed questionnaires, telephone interviews, and visits from Census Bureau field representatives to about 3.5 million household addresses annually.

    Data

    Parameters

    Data

    What we did?

    Original Dataset
    After Processing

    EDA Appetizer

    This image shows the overall population of New Mexico as well as the eight counties that we are interested in.

    EDA Appetizer

    The figure shows the percentage of the observations of eight counties in our data. We can see that Santa Fe and Sandoval are the two major counties where census collected more data.

    EDA Appetizer

    By comparing the races of eight counties which shows the dominant race is white in all of them. In addition, both Sandoval and Santa Fe have larger population then rest of them.

    EDA Appetizer

    The figure can clearly reflect the difference in per capital income between different counties, as well as the change in per capital income in the same county every year.

    Conclusion

    What we have done for now:
    Tnum part:
    1. Summarize and understand the Tnum functions;
    2. Familiar with Tnum cheatsheet and making tree graphs
    Data part:
    1. Extract 43,796 raw observations;
    2. Have designed two functions to split the two columns into seven columns;
    3. Extract some useful information like time, county, etc. for the EDA team.
    EDA:
    1. The majority of census data collected from Sandoval and Santa Fe;
    2. The dominant race in these eight counties is white;
    3. But we also found that the gap between the poor and the rich is very large. The relationship between population and per capital income is inversely proportional.

    What we are going to do next

  • Come up with mapping, text, graphs routine so that the we can easily re-create new graphs in the future.
  • Set a standard for future presentation slides.
  • From the present basic analysis to more detailed geolocated maps and graphs, including bar charts, pie charts, etc.
  • Further analyze the data we have, and try to get more numeric values.
  • Design a Tnum database and articles describing any findings on data quality to help set up a standard to apply for grants.
  • Questions?

    Thank you!